Forensic automatic speaker recognition: fiction or science?
نویسنده
چکیده
Hollywood films and CSI-like movies show a technology landscape far from real, both in forensic speaker recognition and other identification-of-the-source forensic areas. Lay persons are used to good-looking scientist-and-investigators performing voice identifications (”we got a match!”) or smart fancy devices producing voice transformations causing one actor to instantaneously talk with the voice of other. The expectation both in juries and fact-finders about what can be achieved with present technology in court is overestimated, a subject which has been properly investigated and become known as the CSI-effect [18]. Simultaneously, Forensic Identification Science is facing a global challenge [17] impelled firstly by progressively higher requirements for admissibility of expert testimony in Court [8], and secondly by the transparent and testable nature of DNA typing, which is now seen as the new gold-standard model of a scientifically defensible approach to be emulated by all other identification-of-the-source areas [2]. The likelihood ratio (LR) approach to the analysis of forensic evidences [9][1], popularised by DNA typing in the last decades, can be used for any forensic identification disciplines to report the strength of the evidence. This LR avoids conclusions in the form of an absolute decision of authorship, but provides a degree of support to an incriminating hypothesis versus an exonerating alternative hypothesis, avoiding typical interpretation fallacies [1][7]. The reported LR value can then be used by the Court in combination with prior information on the case and other possible evidence. It is the role of the judge or jury to determine if, with all the information at hand, a decision ”beyond reasonable doubt” can be taken or not, based not only on the speech evidence but on all the available information in the case [12]. It is not the scientist’s role to decide on the ultimate issue of guilt or innocence but just to provide information relative to the speech evidence at hand. Forensic Speaker Recognition has had a sordid history (see [14] for an excellent review), which has provoked controversial positions about its use in Court [4][5]. Current forensic approaches, used by several official labs and practitioners, are usually based on single or combined auditory, acoustic and semiautomatic approaches [13]. However, some steps forward are possible (and are being undertaken [16][10]) in order to comply with Forensic Science [7] and admissibility current requirements. Although Automatic Speaker Recognition has experienced dramatic improvements in the last decade [19](see [3] for an overview of state of the art), as shown in yearly NIST Speaker Recognition Evaluations (SRE) since 1996, detection performance is not error free, specially in forensic recordings where low quality (noise, distortions, etc.) and severe mismatch (channel, speaking style, etc.) between control and recovered samples is usual. Fortunately, taking decisions about the identity of the speaker is not the objective in forensic speaker recognition [7][11] but the estimation of reliable information about the strength of the evidence, which can be properly done in the form of LR values. Moreover, systems and procedures eliciting LR values can be calibrated [6], as demonstrated by leading systems in SRE’06 and SRE’08 [3] and recently by some traditional phonetic approaches [11][15], reducing the calibration loss for any prior probability (dependent of the forensic case at hand) at the given discrimination loss. However, caution is needed in order to handle adequate populations in each case, because mismatch between the recording conditions of the population at hand and those present in the questioned and suspect speech can lead to likelihood ratios not accurately representing the weight of the evidence. Putting all the above together, we will show in this presentation how forensic speaker recognition can comply with the requirements of transparency and testability in forensic science [11]. This will lead to fulfilling the court requirements about role separation between scientists and judges/juries, and bring about integration in a forensically adequate framework in which the scientist provides the appropriate information necessary to the court’s decision processes.
منابع مشابه
Forensic automatic speaker recognition
Automatic speaker recognition technology appears to have reached a sufficient level of maturity for realistic application in the field of forensic science. However, there are key issues to be solved before the forensic community will accept its use as an investigative assistant or as evidence in actual criminal cases. To assess the state of the technology, the Federal Bureau of Investigation (F...
متن کاملAutomatic Speaker Recognition for Forensic Case Assessment and Interpretation
Abstract Forensic speaker recognition (FSR) is the process of determining if a specific individual (suspected speaker) is the source of a questioned voice recording (trace). The forensic expert’s role is to testify to the worth of the voice evidence by using, if possible, a quantitative measure of this worth. It is up to the judge and/ or the jury to use this information as an aid to their deli...
متن کاملOn compensation of mismatched recording conditions in the Bayesian approach for forensic automatic speaker recognition.
This paper deals with a procedure to compensate for mismatched recording conditions in forensic speaker recognition, using a statistical score normalization. Bayesian interpretation of the evidence in forensic automatic speaker recognition depends on three sets of recordings in order to perform forensic casework: reference (R) and control (C) recordings of the suspect, and a potential populatio...
متن کاملThe effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications.
In this paper, we analyse mismatched technical conditions in training and testing phases of speaker recognition and their effect on forensic human and automatic speaker recognition. We use perceptual tests performed by non-experts and compare their performance with that of a baseline automatic speaker recognition system. The degradation of the accuracy of human recognition in mismatched recordi...
متن کاملOn the Use of Automatic Speaker
In forensic applications of speaker recognition it is necessary to be able to specify a conndence level for a decision that two sets of recordings have been produced by the same speaker (or by diierent speakers). Forensic phoneticians are sometimes incriminated because they nd it impossible to provide 'hard' estimates of the conndence level of an expert opinion. In this paper it is investigated...
متن کاملComparison of speaker recognition systems on a real forensic benchmark
This paper analyses the performance of several automatic speaker recognition systems using a real forensic database. The systems evaluated have been tested or are currently in use by forensic institutes. A comprehensive error analysis is performed in order to assess the each system’s behaviour to real casework. We further investigate compensation techniques aimed at minimising the performance g...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008